2,481 research outputs found

    Text Mining for Pathway Curation

    Get PDF
    Biolog:innen untersuchen häufig Pathways, Netzwerke von Interaktionen zwischen Proteinen und Genen mit einer spezifischen Funktion. Neue Erkenntnisse über Pathways werden in der Regel zunächst in Publikationen veröffentlicht und dann in strukturierter Form in Lehrbüchern, Datenbanken oder mathematischen Modellen weitergegeben. Deren Kuratierung kann jedoch aufgrund der hohen Anzahl von Publikationen sehr aufwendig sein. In dieser Arbeit untersuchen wir wie Text Mining Methoden die Kuratierung unterstützen können. Wir stellen PEDL vor, ein Machine-Learning-Modell zur Extraktion von Protein-Protein-Assoziationen (PPAs) aus biomedizinischen Texten. PEDL verwendet Distant Supervision und vortrainierte Sprachmodelle, um eine höhere Genauigkeit als vergleichbare Methoden zu erreichen. Eine Evaluation durch Expert:innen bestätigt die Nützlichkeit von PEDLs für Pathway-Kurator:innen. Außerdem stellen wir PEDL+ vor, ein Kommandozeilen-Tool, mit dem auch Nicht-Expert:innen PPAs effizient extrahieren können. Drei Kurator:innen bewerten 55,6 % bis 79,6 % der von PEDL+ gefundenen PPAs als nützlich für ihre Arbeit. Die große Anzahl von PPAs, die durch Text Mining identifiziert werden, kann für Forscher:innen überwältigend sein. Um hier Abhilfe zu schaffen, stellen wir PathComplete vor, ein Modell, das nützliche Erweiterungen eines Pathways vorschlägt. Es ist die erste Pathway-Extension-Methode, die auf überwachtem maschinellen Lernen basiert. Unsere Experimente zeigen, dass PathComplete wesentlich genauer ist als existierende Methoden. Schließlich schlagen wir eine Methode vor, um Pathways mit komplexen Ereignisstrukturen zu erweitern. Hier übertrifft unsere neue Methode zur konditionalen Graphenmodifikation die derzeit beste Methode um 13-24% Genauigkeit in drei Benchmarks. Insgesamt zeigen unsere Ergebnisse, dass Deep Learning basierte Informationsextraktion eine vielversprechende Grundlage für die Unterstützung von Pathway-Kurator:innen ist.Biological knowledge often involves understanding the interactions between molecules, such as proteins and genes, that form functional networks called pathways. New knowledge about pathways is typically communicated through publications and later condensed into structured formats such as textbooks, pathway databases or mathematical models. However, curating updated pathway models can be labour-intensive due to the growing volume of publications. This thesis investigates text mining methods to support pathway curation. We present PEDL (Protein-Protein-Association Extraction with Deep Language Models), a machine learning model designed to extract protein-protein associations (PPAs) from biomedical text. PEDL uses distant supervision and pre-trained language models to achieve higher accuracy than the state of the art. An expert evaluation confirms its usefulness for pathway curators. We also present PEDL+, a command-line tool that allows non-expert users to efficiently extract PPAs. When applied to pathway curation tasks, 55.6% to 79.6% of PEDL+ extractions were found useful by curators. The large number of PPAs identified by text mining can be overwhelming for researchers. To help, we present PathComplete, a model that suggests potential extensions to a pathway. It is the first method based on supervised machine learning for this task, using transfer learning from pathway databases. Our evaluations show that PathComplete significantly outperforms existing methods. Finally, we generalise pathway extension from PPAs to more realistic complex events. Here, our novel method for conditional graph modification outperforms the current best by 13-24% accuracy on three benchmarks. We also present a new dataset for event-based pathway extension. Overall, our results show that deep learning-based information extraction is a promising basis for supporting pathway curators

    Single machine batch scheduling with release times

    Get PDF
    Motivated by a high-throughput logging system, we investigate the single machine scheduling problem with batching, where jobs have release times and processing times, and batches require a setup time. Our objective is to minimize the total flow time, in the online setting. For the online problem where all jobs have identical processing times, we propose a 2-competitive algorithm and we prove acorresponding lower bound. Moreover, we show that if jobs with arbitrary processing times can be processed in any order, any online algorithm has a linear competitive ratio in the worst cas

    Research in the Basic Medical Sciences

    Get PDF

    Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?

    Full text link
    Instruction-tuning has become an integral part of training pipelines for Large Language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection (AED) has emerged as a tool for detecting quality issues of gold-standard labels. But so far, the application of AED methods is limited to discriminative settings. It is an open question how well AED methods generalize to generative settings which are becoming widespread via generative LLMs. In this work, we present a first and new benchmark for AED on instruction-tuning data: Donkii. It encompasses three instruction-tuning datasets enriched with annotations by experts and semi-automatic methods. We find that all three datasets contain clear-cut errors that sometimes directly propagate into instruction-tuned LLMs. We propose four AED baselines for the generative setting and evaluate them comprehensively on the newly introduced dataset. Our results demonstrate that choosing the right AED method and model size is indeed crucial, thereby deriving practical recommendations. To gain insights, we provide a first case-study to examine how the quality of the instruction-tuning datasets influences downstream performance

    BELB: a Biomedical Entity Linking Benchmark

    Full text link
    Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base. It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage knowledge base UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied. We therefore developed BELB, a Biomedical Entity Linking Benchmark, providing access in a unified format to 11 corpora linked to 7 knowledge bases and spanning six entity types: gene, disease, chemical, species, cell line and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models

    NLProlog: Reasoning with Weak Unification for Question Answering in Natural Language

    Full text link
    Rule-based models are attractive for various tasks because they inherently lead to interpretable and explainable decisions and can easily incorporate prior knowledge. However, such systems are difficult to apply to problems involving natural language, due to its linguistic variability. In contrast, neural models can cope very well with ambiguity by learning distributed representations of words and their composition from data, but lead to models that are difficult to interpret. In this paper, we describe a model combining neural networks with logic programming in a novel manner for solving multi-hop reasoning tasks over natural language. Specifically, we propose to use a Prolog prover which we extend to utilize a similarity function over pretrained sentence encoders. We fine-tune the representations for the similarity function via backpropagation. This leads to a system that can apply rule-based reasoning to natural language, and induce domain-specific rules from training data. We evaluate the proposed system on two different question answering tasks, showing that it outperforms two baselines -- BIDAF (Seo et al., 2016a) and FAST QA (Weissenborn et al., 2017b) on a subset of the WikiHop corpus and achieves competitive results on the MedHop data set (Welbl et al., 2017).Comment: ACL 201

    Matrix gla protein (MGP): an overexpressed and migration-promoting mesenchymal component in glioblastoma

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent studies have demonstrated that a molecular subtype of glioblastoma is characterized by overexpression of extracellular matrix (ECM)/mesenchymal components and shorter survival. Specifically, gene expression profiling studies revealed that matrix gla protein (MGP), whose function has traditionally been linked to inhibition of calcification of arteries and cartilage, is overexpressed in glioblastomas and associated with worse outcome.</p> <p>Methods</p> <p>In order to analyze the role of MGP in glioblastomas, we performed expression, migration and proliferation studies.</p> <p>Results</p> <p>Real-time PCR and ELISA assays confirmed overexpression of MGP in glioblastoma biopsy specimens and cell lines at mRNA and protein levels as compared to normal brain tissue. Immunohistochemistry verified positivity of glial tumor cells for MGP. RNAi-mediated knockdown of MGP in three glioma cell lines (U343MG, U373MG, H4) led to marked reduction of migration, as demonstrated by wound healing and transwell assays, while no effect on proliferation was seen.</p> <p>Conclusion</p> <p>Our data suggest that upregulation of MGP (and possibly other ECM-related components as well) results in unfavorable prognosis via increased migration.</p

    Movement amplitude on the Functional Re-adaptive Exercise Device: deep spinal muscle activity and movement control

    Get PDF
    Purpose: Lumbar multifidus (LM) and transversus abdominis (TrA) show altered motor control, and LM is atrophied, in people with low-back pain (LBP). The Functional Re-adaptive Exercise Device (FRED) involves cyclical lower-limb movement against minimal resistance in an upright posture. It has been shown to recruit LM and TrA automatically, and may have potential as an intervention for non-specific LBP. However, no studies have yet investigated the effects of changes in FRED movement amplitude on the activity of these muscles. This study aimed to assess the effects of different FRED movement amplitudes on LM and TrA muscle thickness and movement variability, in order to inform an evidence-based exercise prescription. Methods: Lumbar multifidus and TrA thickness of eight healthy male volunteers was examined using ultrasound imaging during FRED exercise, normalised to rest at four different movement amplitudes. Movement variability was also measured. Magnitude-based inferences were used to compare each amplitude. Results: Exercise at all amplitudes recruited LM and TrA more than rest, with thickness increases of approximately 5 mm and 1 mm, respectively. Larger amplitudes also caused increased TrA thickness, LM and TrA muscle thickness variability and movement variability. The data suggest that all amplitudes are useful for recruiting LM and TrA. Conclusions: A progressive training protocol should start in the smallest amplitude, increasing the setting once participants can maintain a consistent movement speed, in order to continue to challenge the motor control system

    E-Mail Management: A Techno-Managerial Research Perspective

    Get PDF
    A panel session on e-mail management was organized at ICIS 2005 in Las Vegas, Nev. The panelists provided perspectives from industry as well as academia and discussed various problems in e-mail management, research methodologies to address these problems, various research opportunities, and an integrative framework for research on e-mail management. This paper succinctly summarizes the presentations made by the panelists during the session and issues raised by the audience. A rich bibliography and Web links are provided at the end for researchers interested in this area of research
    corecore